智能论文笔记

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

Lisa Raithel , Philippe Thomas , Roland Roller , Oliver Sapina , Sebastian Möller , Pierre Zweigenbaum

分类：自然语言处理 | 机器学习

2022-08-03

在这项工作中，我们介绍了患者生成的含量中第一个用于德国不良药物反应（ADR）检测的语料库。该数据包括来自德国患者论坛的4,169个二进制注释的文档，用户谈论健康问题并从医生那里获得建议。正如该领域的社交媒体数据中常见的那样，语料库的类标签非常不平衡。这一主题不平衡使其成为一个非常具有挑战性的数据集，因为通常相同的症状可能会有几种原因，并且并不总是与药物摄入有关。我们旨在鼓励在ADR检测领域进行进一步的多语性努力，并使用基于多语言模型的零和少数学习方法为二进制分类提供初步实验。当对XLM-Roberta进行微调首先在英语患者论坛数据上，然后在新的德国数据上进行微调时，我们的正面级别的F1得分为37.52。我们使数据集和模型公开可供社区使用。

translated by 谷歌翻译

Understanding the Relation of User and News Representations in Content-Based Neural News Recommendation

Lucas Möller , Sebastian Padó

分类：自然语言处理

2022-07-29

已经提出了许多基于神经内容的新闻建议的模型。但是，对此类系统的三个主要组成部分（新闻编码器，用户编码和评分功能）和所涉及的权衡的相对重要性的了解有限。在本文中，我们评估了以下假设：匹配用户和候选新闻表示的最广泛使用的方法不够表达。我们允许我们的系统通过评估更具表现力的评分功能来建模两者之间的更复杂的关系。在广泛的基线和建立的系统中，这会导致AUC中约6分的一致改进。我们的结果还表明，新闻编码器的复杂性与评分功能之间的权衡：一个相当简单的基线模型在思维数据集中得分远高于68％的AUC，并且在已发布的最新艺术品的2点范围内，而同时也是如此。需要一小部分计算成本。

translated by 谷歌翻译

A Transfer Learning Based Model for Text Readability Assessment in German

Salar Mohtaj , Babak Naderi , Sebastian Möller , Faraz Maschhur , Chuyang Wu , Max Reinhard

分类：自然语言处理 | 人工智能 | 机器学习

2022-07-13

从语言学习者到残疾人，文本可读性评估对不同目标人士有广泛的应用。网络上文本内容生产的快速速度使得如果没有机器学习和自然语言处理技术的好处，就无法测量文本复杂性。尽管各种研究涉及近年来英语文本的可读性评估，但仍有改进其他语言的模型的空间。在本文中，我们提出了一种基于转移学习的德语文本评估文本复杂性评估的新模型。我们的结果表明，该模型比从输入文本中提取的语言特征优于更多经典的解决方案。最佳模型是基于BERT预训练的语言模型，达到了均方根误差（RMSE）为0.483。

translated by 谷歌翻译

A Medical Information Extraction Workbench to Process German Clinical Text

Roland Roller , Laura Seiffe , Ammer Ayach , Sebastian Möller , Oliver Marten , Michael Mikhailov , Christoph Alt , Danilo Schmidt , Fabian Halleck , Marcel Naik

分类：自然语言处理

2022-07-08

背景：在信息提取和自然语言处理域中，可访问的数据集对于复制和比较结果至关重要。公开可用的实施和工具可以用作基准，并促进更复杂的应用程序的开发。但是，在临床文本处理的背景下，可访问数据集的数量很少 - 现有工具的数量也很少。主要原因之一是数据的敏感性。对于非英语语言，这个问题更为明显。方法：为了解决这种情况，我们介绍了一个工作台：德国临床文本处理模型的集合。这些模型接受了德国肾脏病报告的识别语料库的培训。结果：提出的模型为内域数据提供了有希望的结果。此外，我们表明我们的模型也可以成功应用于德语的其他生物医学文本。我们的工作台公开可用，因此可以开箱即用，或转移到相关问题上。

translated by 谷歌翻译

Mediators: Conversational Agents Explaining NLP Model Behavior

Nils Feldhus , Ajay Madhavan Ravichandran , Sebastian Möller

分类：自然语言处理 | 人工智能 | 机器学习

2022-06-13

以人为中心的可解释人工智能（HCXAI）社区提出了将解释过程作为人与机器之间的对话进行构建。在该立场论文中，我们为基于文本的对话剂建立了Desiderata，能够使用自然语言进行交互方式解释神经模型的行为。从自然语言处理（NLP）研究的角度来看，我们设计了这种调解人的蓝图，以进行情感分析的任务，并评估当前的研究在基于对话的解释方面走上了多远。

translated by 谷歌翻译

A Feature Extraction based Model for Hate Speech Identification

Salar Mohtaj , Vera Schmitt , Sebastian Möller

分类：自然语言处理 | 人工智能 | 机器学习

2022-01-11

仇恨语音在线的检测已成为一项重要的任务，因为伤害，淫秽和侮辱性内容等冒犯性语言可能会危害边缘化的人或团体。本文介绍了Indo-European语言中的仇恨语音和冒犯内容识别的共同任务任务1A和1B的任务1A和1B的实验和结果。在整个竞争中，对各种子特派团评估了不同的自然语言处理模型的成功。我们通过竞争对手基于单词和字符级别的复发神经网络测试了不同的模型，并通过竞争对手基于提供的数据集进行了学习方法。在已经用于实验的测试模型中，基于转移学习的模型在两个子任务中获得了最佳结果。

translated by 谷歌翻译

Logic Mill -- A Knowledge Navigation System

Sebastian Erhardt , Mainak Ghosh , Erik Buunk , Michael E. Rose , Dietmar Harhoff

分类：自然语言处理

2022-12-31

Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.

translated by 谷歌翻译

NISQ-ready community detection based on separation-node identification

Jonas Stein , Dominik Ott , Mirco Schoenfeld , Sebastian Feld

分类：机器学习

2022-12-30

The analysis of network structure is essential to many scientific areas, ranging from biology to sociology. As the computational task of clustering these networks into partitions, i.e., solving the community detection problem, is generally NP-hard, heuristic solutions are indispensable. The exploration of expedient heuristics has led to the development of particularly promising approaches in the emerging technology of quantum computing. Motivated by the substantial hardware demands for all established quantum community detection approaches, we introduce a novel QUBO based approach that only needs number-of-nodes many qubits and is represented by a QUBO-matrix as sparse as the input graph's adjacency matrix. The substantial improvement on the sparsity of the QUBO-matrix, which is typically very dense in related work, is achieved through the novel concept of separation-nodes. Instead of assigning every node to a community directly, this approach relies on the identification of a separation-node set, which -- upon its removal from the graph -- yields a set of connected components, representing the core components of the communities. Employing a greedy heuristic to assign the nodes from the separation-node sets to the identified community cores, subsequent experimental results yield a proof of concept. This work hence displays a promising approach to NISQ ready quantum community detection, catalyzing the application of quantum computers for the network structure analysis of large scale, real world problem instances.

translated by 谷歌翻译

A Memetic Algorithm with Reinforcement Learning for Sociotechnical Production Scheduling

Felix Grumbach , Nour Eldin Alaa Badr , Pascal Reusch , Sebastian Trojahn

分类：机器学习 | 人工智能

2022-12-21

The following article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). In recent years, there has been extensive research on DRL techniques, but without considering realistic, flexible and human-centered shopfloors. A research gap can be identified in the context of make-to-order oriented discontinuous manufacturing as it is often represented in medium-size companies with high service levels. From practical industry projects in this domain, we recognize requirements to depict flexible machines, human workers and capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-depended setup times and (partially) automated tasks. On the other hand, intensive research has been done on metaheuristics in the context of DRC-FJSSP. However, there is a lack of suitable and generic scheduling methods that can be holistically applied in sociotechnical production and assembly processes. In this paper, we first formulate an extended DRC-FJSSP induced by the practical requirements mentioned. Then we present our proposed hybrid framework with parallel computing for multicriteria optimization. Through numerical experiments with real-world data, we confirm that the framework generates feasible schedules efficiently and reliably. Utilizing DRL instead of random operations leads to better results and outperforms traditional approaches.

translated by 谷歌翻译

Needle in a Haystack: An Analysis of Finding Qualified Workers on MTurk for Summarization

Lining Zhang , João Sedoc , Simon Mille , Yufang Hou , Sebastian Gehrmann , Daniel Deutsch , Elizabeth Clark , Yixin Liu , Miruna Clinciu , Saad Mahamood

分类：自然语言处理

2022-12-20

The acquisition of high-quality human annotations through crowdsourcing platforms like Amazon Mechanical Turk (MTurk) is more challenging than expected. The annotation quality might be affected by various aspects like annotation instructions, Human Intelligence Task (HIT) design, and wages paid to annotators, etc. To avoid potentially low-quality annotations which could mislead the evaluation of automatic summarization system outputs, we investigate the recruitment of high-quality MTurk workers via a three-step qualification pipeline. We show that we can successfully filter out bad workers before they carry out the evaluations and obtain high-quality annotations while optimizing the use of resources. This paper can serve as basis for the recruitment of qualified annotators in other challenging annotation tasks.

translated by 谷歌翻译